open wedge
Combinatorial Approximations for Cluster Deletion: Simpler, Faster, and Better
Balmaseda, Vicente, Xu, Ying, Cao, Yixin, Veldt, Nate
Graph clustering is a fundamental task in graph mining where the goal is to partition nodes of a graph into disjoint clusters that have dense internal connections but are only sparsely connected to the rest of the graph. This has a wide variety of applications which include detecting communities in social networks [Fortunato, 2010], identifying related genes in biological networks based on gene expression profiles [Ben-Dor et al., 1999], and finding groups of pixels in an image that belong to the same object [Shi and Malik, 2000]. An idealized notion of a cluster in a graph is a set of nodes that is completely connected internally (i.e., a clique) while being completely disconnected from the rest of the graph. Cluster graph modification problems [Shamir et al., 2004] are a class of graph clustering objectives that seek to edit the edges in a graph as little as possible in order to achieve this idealized structure. One widely studied problem is correlation clustering [Bansal et al., 2004], which can be cast as adding or deleting a minimum number of edges to convert a graph into a disjoint union of cliques. This problem is also known as cluster editing.
Faster Approximation Algorithms for Parameterized Graph Clustering and Edge Labeling
Graph clustering is a fundamental task in network analysis where the goal is to detect sets of nodes that are well-connected to each other but sparsely connected to the rest of the graph. We present faster approximation algorithms for an NP-hard parameterized clustering framework called LambdaCC, which is governed by a tunable resolution parameter and generalizes many other clustering objectives such as modularity, sparsest cut, and cluster deletion. Previous LambdaCC algorithms are either heuristics with no approximation guarantees, or computationally expensive approximation algorithms. We provide fast new approximation algorithms that can be made purely combinatorial. These rely on a new parameterized edge labeling problem we introduce that generalizes previous edge labeling problems that are based on the principle of strong triadic closure and are of independent interest in social network analysis. Our methods are orders of magnitude more scalable than previous approximation algorithms and our lower bounds allow us to obtain a posteriori approximation guarantees for previous heuristics that have no approximation guarantees of their own.